Load the packages to be used and load the singer data set.
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(forcats))
suppressPackageStartupMessages(library(knitr))
suppressPackageStartupMessages(library(gapminder))
Factors are how categorical data are stored. The values a factor can take on are called levels. It is important to check variable types. What you think are characters may actually be stored numerically. Let’s explore the data frame.
summary(gapminder)
country continent year lifeExp
Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60
Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20
Algeria : 12 Asia :396 Median :1980 Median :60.71
Angola : 12 Europe :360 Mean :1980 Mean :59.47
Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85
Australia : 12 Max. :2007 Max. :82.60
(Other) :1632
pop gdpPercap
Min. :6.001e+04 Min. : 241.2
1st Qu.:2.794e+06 1st Qu.: 1202.1
Median :7.024e+06 Median : 3531.8
Mean :2.960e+07 Mean : 7215.3
3rd Qu.:1.959e+07 3rd Qu.: 9325.5
Max. :1.319e+09 Max. :113523.1
Before beginning the factor management exercise, let’s get to know the factors in the gapminder data set.
class(gapminder$country)
[1] "factor"
factor
class(gapminder$continent)
[1] "factor"
factor
class(gapminder$year)
[1] "integer"
integer
class(gapminder$lifeExp)
[1] "numeric"
numeric
class(gapminder$pop)
[1] "integer"
integer
class(gapminder$gdpPercap)
[1] "numeric"
numeric
We see that we have two factors in this data set. Let’s see the levels for “country” and “continent” before we begin filtering the data.
nlevels(gapminder$country)
[1] 142
nlevels(gapminder$continent)
[1] 5
levels(gapminder$country)
[1] "Afghanistan" "Albania"
[3] "Algeria" "Angola"
[5] "Argentina" "Australia"
[7] "Austria" "Bahrain"
[9] "Bangladesh" "Belgium"
[11] "Benin" "Bolivia"
[13] "Bosnia and Herzegovina" "Botswana"
[15] "Brazil" "Bulgaria"
[17] "Burkina Faso" "Burundi"
[19] "Cambodia" "Cameroon"
[21] "Canada" "Central African Republic"
[23] "Chad" "Chile"
[25] "China" "Colombia"
[27] "Comoros" "Congo, Dem. Rep."
[29] "Congo, Rep." "Costa Rica"
[31] "Cote d'Ivoire" "Croatia"
[33] "Cuba" "Czech Republic"
[35] "Denmark" "Djibouti"
[37] "Dominican Republic" "Ecuador"
[39] "Egypt" "El Salvador"
[41] "Equatorial Guinea" "Eritrea"
[43] "Ethiopia" "Finland"
[45] "France" "Gabon"
[47] "Gambia" "Germany"
[49] "Ghana" "Greece"
[51] "Guatemala" "Guinea"
[53] "Guinea-Bissau" "Haiti"
[55] "Honduras" "Hong Kong, China"
[57] "Hungary" "Iceland"
[59] "India" "Indonesia"
[61] "Iran" "Iraq"
[63] "Ireland" "Israel"
[65] "Italy" "Jamaica"
[67] "Japan" "Jordan"
[69] "Kenya" "Korea, Dem. Rep."
[71] "Korea, Rep." "Kuwait"
[73] "Lebanon" "Lesotho"
[75] "Liberia" "Libya"
[77] "Madagascar" "Malawi"
[79] "Malaysia" "Mali"
[81] "Mauritania" "Mauritius"
[83] "Mexico" "Mongolia"
[85] "Montenegro" "Morocco"
[87] "Mozambique" "Myanmar"
[89] "Namibia" "Nepal"
[91] "Netherlands" "New Zealand"
[93] "Nicaragua" "Niger"
[95] "Nigeria" "Norway"
[97] "Oman" "Pakistan"
[99] "Panama" "Paraguay"
[101] "Peru" "Philippines"
[103] "Poland" "Portugal"
[105] "Puerto Rico" "Reunion"
[107] "Romania" "Rwanda"
[109] "Sao Tome and Principe" "Saudi Arabia"
[111] "Senegal" "Serbia"
[113] "Sierra Leone" "Singapore"
[115] "Slovak Republic" "Slovenia"
[117] "Somalia" "South Africa"
[119] "Spain" "Sri Lanka"
[121] "Sudan" "Swaziland"
[123] "Sweden" "Switzerland"
[125] "Syria" "Taiwan"
[127] "Tanzania" "Thailand"
[129] "Togo" "Trinidad and Tobago"
[131] "Tunisia" "Turkey"
[133] "Uganda" "United Kingdom"
[135] "United States" "Uruguay"
[137] "Venezuela" "Vietnam"
[139] "West Bank and Gaza" "Yemen, Rep."
[141] "Zambia" "Zimbabwe"
Afghanistan
Albania
Algeria
Angola
Argentina
Australia
Austria
Bahrain
Bangladesh
Belgium
Benin
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Canada
Central African Republic
Chad
Chile
China
Colombia
Comoros
Congo, Dem. Rep.
Congo, Rep.
Costa Rica
Cote d'Ivoire
Croatia
Cuba
Czech Republic
Denmark
Djibouti
Dominican Republic
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Ethiopia
Finland
France
Gabon
Gambia
Germany
Ghana
Greece
Guatemala
Guinea
Guinea-Bissau
Haiti
Honduras
Hong Kong, China
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Israel
Italy
Jamaica
Japan
Jordan
Kenya
Korea, Dem. Rep.
Korea, Rep.
Kuwait
Lebanon
Lesotho
Liberia
Libya
Madagascar
Malawi
Malaysia
Mali
Mauritania
Mauritius
Mexico
Mongolia
Montenegro
Morocco
Mozambique
Myanmar
Namibia
Nepal
Netherlands
New Zealand
Nicaragua
Niger
Nigeria
Norway
Oman
Pakistan
Panama
Paraguay
Peru
Philippines
Poland
Portugal
Puerto Rico
Reunion
Romania
Rwanda
Sao Tome and Principe
Saudi Arabia
Senegal
Serbia
Sierra Leone
Singapore
Slovak Republic
Slovenia
Somalia
South Africa
Spain
Sri Lanka
Sudan
Swaziland
Sweden
Switzerland
Syria
Taiwan
Tanzania
Thailand
Togo
Trinidad and Tobago
Tunisia
Turkey
Uganda
United Kingdom
United States
Uruguay
Venezuela
Vietnam
West Bank and Gaza
Yemen, Rep.
Zambia
Zimbabwe
levels(gapminder$continent)
[1] "Africa" "Americas" "Asia" "Europe" "Oceania"
Africa
Americas
Asia
Europe
Oceania
Now let’s work towards dropping “Oceoania”.
We will start by filtering the gapminder data to remove observations associated with the continent of Oceania.
No_Oceania <- gapminder %>%
filter(continent == "Africa" | continent =="Americas"|continent == "Asia"|continent == "Europe")
No_Oceania %>%
sample_frac(0.1) %>% #just showing a sample of the data
knitr:: kable(format = "markdown", justify = "centre")
| country | continent | year | lifeExp | pop | gdpPercap |
|---|---|---|---|---|---|
| Austria | Europe | 1962 | 69.540 | 7129864 | 10750.7211 |
| Lebanon | Asia | 1977 | 66.099 | 3115787 | 8659.6968 |
| Tunisia | Africa | 1997 | 71.973 | 9231669 | 4876.7986 |
| Swaziland | Africa | 1962 | 44.992 | 370006 | 1856.1821 |
| Afghanistan | Asia | 1992 | 41.674 | 16317921 | 649.3414 |
| Haiti | Americas | 2002 | 58.137 | 7607651 | 1270.3649 |
| Mexico | Americas | 1967 | 60.110 | 47995559 | 5754.7339 |
| Myanmar | Asia | 1972 | 53.070 | 28466390 | 357.0000 |
| Liberia | Africa | 1967 | 41.536 | 1279406 | 713.6036 |
| Nepal | Asia | 2002 | 61.340 | 25873917 | 1057.2063 |
| El Salvador | Americas | 2007 | 71.878 | 6939688 | 5728.3535 |
| Canada | Americas | 1987 | 76.860 | 26549700 | 26626.5150 |
| Congo, Rep. | Africa | 1972 | 54.907 | 1340458 | 3213.1527 |
| France | Europe | 1952 | 67.410 | 42459667 | 7029.8093 |
| Niger | Africa | 1952 | 37.444 | 3379468 | 761.8794 |
| Turkey | Europe | 1962 | 52.098 | 29788695 | 2322.8699 |
| Benin | Africa | 2002 | 54.406 | 7026113 | 1372.8779 |
| Japan | Asia | 1967 | 71.430 | 100825279 | 9847.7886 |
| Jamaica | Americas | 1982 | 71.210 | 2298309 | 6068.0513 |
| Congo, Dem. Rep. | Africa | 1992 | 45.548 | 41672143 | 457.7192 |
| Albania | Europe | 1982 | 70.420 | 2780097 | 3630.8807 |
| Libya | Africa | 1992 | 68.755 | 4364501 | 9640.1385 |
| Sweden | Europe | 1972 | 74.720 | 8122293 | 17832.0246 |
| Mongolia | Asia | 1952 | 42.244 | 800663 | 786.5669 |
| Equatorial Guinea | Africa | 1967 | 38.987 | 259864 | 915.5960 |
| Serbia | Europe | 1957 | 61.685 | 7271135 | 4981.0909 |
| Comoros | Africa | 1982 | 52.933 | 348643 | 1267.1001 |
| Equatorial Guinea | Africa | 1987 | 45.664 | 341244 | 966.8968 |
| Syria | Asia | 1957 | 48.284 | 4149908 | 2117.2349 |
| Bosnia and Herzegovina | Europe | 1997 | 73.244 | 3607000 | 4766.3559 |
| Djibouti | Africa | 1977 | 46.519 | 228694 | 3081.7610 |
| Mexico | Americas | 1982 | 67.405 | 71640904 | 9611.1475 |
| Romania | Europe | 2007 | 72.476 | 22276056 | 10808.4756 |
| Namibia | Africa | 1962 | 48.386 | 621392 | 3173.2156 |
| Syria | Asia | 1972 | 57.296 | 6701172 | 2571.4230 |
| Singapore | Asia | 1972 | 69.521 | 2152400 | 8597.7562 |
| Guinea | Africa | 1967 | 37.197 | 3451418 | 708.7595 |
| Chile | Americas | 1957 | 56.074 | 7048426 | 4315.6227 |
| Korea, Dem. Rep. | Asia | 1962 | 56.656 | 10917494 | 1621.6936 |
| Guinea-Bissau | Africa | 1987 | 41.245 | 927524 | 736.4154 |
| Belgium | Europe | 1987 | 75.350 | 9870200 | 22525.5631 |
| Morocco | Africa | 1957 | 45.423 | 11406350 | 1642.0023 |
| Denmark | Europe | 1962 | 72.350 | 4646899 | 13583.3135 |
| Egypt | Africa | 1977 | 53.319 | 38783863 | 2785.4936 |
| Equatorial Guinea | Africa | 1982 | 43.662 | 285483 | 927.8253 |
| Yemen, Rep. | Asia | 1952 | 32.548 | 4963829 | 781.7176 |
| Vietnam | Asia | 1962 | 45.363 | 33796140 | 772.0492 |
| Burkina Faso | Africa | 1972 | 43.591 | 5433886 | 854.7360 |
| Equatorial Guinea | Africa | 1972 | 40.516 | 277603 | 672.4123 |
| Bolivia | Americas | 1997 | 62.050 | 7693188 | 3326.1432 |
| Hungary | Europe | 1962 | 67.960 | 10063000 | 7550.3599 |
| Tunisia | Africa | 1977 | 59.837 | 6005061 | 3120.8768 |
| Vietnam | Asia | 1987 | 62.820 | 62826491 | 820.7994 |
| Argentina | Americas | 1982 | 69.942 | 29341374 | 8997.8974 |
| Bahrain | Asia | 1962 | 56.923 | 171863 | 12753.2751 |
| Thailand | Asia | 1982 | 64.597 | 48827160 | 2393.2198 |
| Uganda | Africa | 1982 | 49.849 | 12939400 | 682.2662 |
| Canada | Americas | 1962 | 71.300 | 18985849 | 13462.4855 |
| Belgium | Europe | 2002 | 78.320 | 10311970 | 30485.8838 |
| Italy | Europe | 2007 | 80.546 | 58147733 | 28569.7197 |
| Thailand | Asia | 1957 | 53.630 | 25041917 | 793.5774 |
| Iraq | Asia | 2002 | 57.046 | 24001816 | 4390.7173 |
| Cuba | Americas | 1977 | 72.649 | 9537988 | 6380.4950 |
| Morocco | Africa | 2002 | 69.615 | 31167783 | 3258.4956 |
| Greece | Europe | 1967 | 71.000 | 8716441 | 8513.0970 |
| Rwanda | Africa | 1962 | 43.000 | 3051242 | 597.4731 |
| Cote d’Ivoire | Africa | 1967 | 47.350 | 4744870 | 2052.0505 |
| Korea, Dem. Rep. | Asia | 1977 | 67.159 | 16325320 | 4106.3012 |
| Niger | Africa | 2007 | 56.867 | 12894865 | 619.6769 |
| Jamaica | Americas | 2002 | 72.047 | 2664659 | 6994.7749 |
| Kuwait | Asia | 1977 | 69.343 | 1140357 | 59265.4771 |
| Pakistan | Asia | 1972 | 51.929 | 69325921 | 1049.9390 |
| Italy | Europe | 1977 | 73.480 | 56059245 | 14255.9847 |
| Norway | Europe | 1982 | 75.970 | 4114787 | 26298.6353 |
| Nicaragua | Americas | 1972 | 55.151 | 2182908 | 4688.5933 |
| Botswana | Africa | 1967 | 53.298 | 553541 | 1214.7093 |
| Greece | Europe | 1957 | 67.860 | 8096218 | 4916.2999 |
| Tanzania | Africa | 1987 | 51.535 | 23040630 | 831.8221 |
| South Africa | Africa | 1952 | 45.009 | 14264935 | 4725.2955 |
| Eritrea | Africa | 1987 | 46.453 | 2915959 | 521.1341 |
| Trinidad and Tobago | Americas | 1962 | 64.900 | 887498 | 4997.5240 |
| Iraq | Asia | 2007 | 59.545 | 27499638 | 4471.0619 |
| Israel | Asia | 1997 | 78.269 | 5531387 | 20896.6092 |
| Albania | Europe | 1962 | 64.820 | 1728137 | 2312.8890 |
| Congo, Rep. | Africa | 1967 | 52.040 | 1179760 | 2677.9396 |
| Algeria | Africa | 2007 | 72.301 | 33333216 | 6223.3675 |
| Cameroon | Africa | 1962 | 42.643 | 5793633 | 1399.6074 |
| Cote d’Ivoire | Africa | 2002 | 46.832 | 16252726 | 1648.8008 |
| Myanmar | Asia | 1982 | 58.056 | 34680442 | 424.0000 |
| Korea, Rep. | Asia | 1982 | 67.123 | 39326000 | 5622.9425 |
| Denmark | Europe | 2002 | 77.180 | 5374693 | 32166.5001 |
| Turkey | Europe | 1957 | 48.079 | 25670939 | 2218.7543 |
| Libya | Africa | 1952 | 42.723 | 1019729 | 2387.5481 |
| Thailand | Asia | 1997 | 67.521 | 60216677 | 5852.6255 |
| Germany | Europe | 1977 | 72.500 | 78160773 | 20512.9212 |
| Portugal | Europe | 1977 | 70.410 | 9662600 | 10172.4857 |
| Puerto Rico | Americas | 2007 | 78.746 | 3942491 | 19328.7090 |
| Brazil | Americas | 2002 | 71.006 | 179914212 | 8131.2128 |
| Sierra Leone | Africa | 2007 | 42.568 | 6144562 | 862.5408 |
| Mauritius | Africa | 1957 | 58.089 | 609816 | 2034.0380 |
| Ghana | Africa | 1982 | 53.744 | 11400338 | 876.0326 |
| Bulgaria | Europe | 1997 | 70.320 | 8066057 | 5970.3888 |
| Ethiopia | Africa | 1982 | 44.916 | 38111756 | 577.8607 |
| Somalia | Africa | 1967 | 38.977 | 3428839 | 1284.7332 |
| Jordan | Asia | 1987 | 65.869 | 2820042 | 4448.6799 |
| Peru | Americas | 1967 | 51.445 | 12132200 | 5788.0933 |
| Rwanda | Africa | 1972 | 44.600 | 3992121 | 590.5807 |
| Bulgaria | Europe | 1972 | 70.900 | 8576200 | 6597.4944 |
| Egypt | Africa | 1962 | 46.992 | 28173309 | 1693.3359 |
| Mongolia | Asia | 2007 | 66.803 | 2874127 | 3095.7723 |
| Finland | Europe | 2007 | 79.313 | 5238460 | 33207.0844 |
| Honduras | Americas | 1977 | 57.402 | 3055235 | 3203.2081 |
| Sao Tome and Principe | Africa | 1982 | 60.351 | 98593 | 1890.2181 |
| Cuba | Americas | 1972 | 70.723 | 8831348 | 5305.4453 |
| Sweden | Europe | 1992 | 78.160 | 8718867 | 23880.0168 |
| Czech Republic | Europe | 1997 | 74.010 | 10300707 | 16048.5142 |
| South Africa | Africa | 1987 | 60.834 | 35933379 | 7825.8234 |
| Czech Republic | Europe | 1972 | 70.290 | 9862158 | 13108.4536 |
| Montenegro | Europe | 1987 | 74.865 | 569473 | 11732.5102 |
| Denmark | Europe | 1987 | 74.800 | 5127024 | 25116.1758 |
| Iceland | Europe | 1982 | 76.990 | 233997 | 23269.6075 |
| Afghanistan | Asia | 1997 | 41.763 | 22227415 | 635.3414 |
| Guatemala | Americas | 1972 | 53.738 | 5149581 | 4031.4083 |
| Niger | Africa | 1992 | 47.391 | 8392818 | 581.1827 |
| Central African Republic | Africa | 1987 | 50.485 | 2840009 | 844.8764 |
| Paraguay | Americas | 1982 | 66.874 | 3366439 | 4258.5036 |
| Trinidad and Tobago | Americas | 1987 | 69.582 | 1191336 | 7388.5978 |
| Lebanon | Asia | 1972 | 65.421 | 2680018 | 7486.3843 |
| Costa Rica | Americas | 1987 | 74.752 | 2799811 | 5629.9153 |
| Cote d’Ivoire | Africa | 1972 | 49.801 | 6071696 | 2378.2011 |
| Angola | Africa | 2007 | 42.731 | 12420476 | 4797.2313 |
| Austria | Europe | 1992 | 76.040 | 7914969 | 27042.0187 |
| Belgium | Europe | 1992 | 76.460 | 10045622 | 25575.5707 |
| Congo, Dem. Rep. | Africa | 1962 | 42.122 | 17486434 | 896.3146 |
| Guinea | Africa | 1952 | 33.609 | 2664249 | 510.1965 |
| Trinidad and Tobago | Americas | 2002 | 68.976 | 1101832 | 11460.6002 |
| Rwanda | Africa | 1992 | 23.599 | 7290203 | 737.0686 |
| Afghanistan | Asia | 1987 | 40.822 | 13867957 | 852.3959 |
| Senegal | Africa | 1997 | 60.187 | 9535314 | 1392.3683 |
| Panama | Americas | 1992 | 72.462 | 2484997 | 6618.7431 |
| Kenya | Africa | 1957 | 44.686 | 7454779 | 944.4383 |
| Norway | Europe | 1957 | 73.440 | 3491938 | 11653.9730 |
| Panama | Americas | 1972 | 66.216 | 1616384 | 5364.2497 |
| Iran | Asia | 1992 | 65.742 | 60397973 | 7235.6532 |
| Sudan | Africa | 1982 | 50.338 | 20367053 | 1895.5441 |
| Lesotho | Africa | 2002 | 44.593 | 2046772 | 1275.1846 |
| West Bank and Gaza | Asia | 2002 | 72.370 | 3389578 | 4515.4876 |
| Slovenia | Europe | 1987 | 72.250 | 1945870 | 18678.5349 |
| Angola | Africa | 1962 | 34.000 | 4826015 | 4269.2767 |
| Mauritius | Africa | 1972 | 62.944 | 851334 | 2575.4842 |
| Hong Kong, China | Asia | 1967 | 70.000 | 3722800 | 6197.9628 |
| Hong Kong, China | Asia | 2002 | 81.495 | 6762476 | 30209.0152 |
| Slovak Republic | Europe | 1977 | 70.450 | 4827803 | 10922.6640 |
| Bangladesh | Asia | 1977 | 46.923 | 80428306 | 659.8772 |
| Gabon | Africa | 1967 | 44.598 | 489004 | 8358.7620 |
| Denmark | Europe | 1972 | 73.470 | 4991596 | 18866.2072 |
| Korea, Rep. | Asia | 1987 | 69.810 | 41622000 | 8533.0888 |
| Swaziland | Africa | 1967 | 46.633 | 420690 | 2613.1017 |
| Cambodia | Asia | 1962 | 43.415 | 6083619 | 496.9136 |
| Japan | Asia | 1987 | 78.670 | 122091325 | 22375.9419 |
| Korea, Rep. | Asia | 1992 | 72.244 | 43805450 | 12104.2787 |
| Chile | Americas | 1997 | 75.816 | 14599929 | 10118.0532 |
| Spain | Europe | 1972 | 73.060 | 34513161 | 10638.7513 |
| Central African Republic | Africa | 1957 | 37.464 | 1392284 | 1190.8443 |
| Djibouti | Africa | 1962 | 39.693 | 89898 | 3020.9893 |
| Uruguay | Americas | 1992 | 72.752 | 3149262 | 8137.0048 |
| Swaziland | Africa | 1992 | 58.474 | 962344 | 3553.0224 |
| Panama | Americas | 1977 | 68.681 | 1839782 | 5351.9121 |
Let’s check how many levels there are and whether we need to remove unused factor levels.
nlevels(No_Oceania$continent)
[1] 5
levels(No_Oceania$continent)
[1] "Africa" "Americas" "Asia" "Europe" "Oceania"
Africa
Americas
Asia
Europe
Oceania
We see that there are still 5 levels. Let’s remove the unused factor levels.
Drop_Oceania <- No_Oceania %>%
droplevels()
nlevels(Drop_Oceania$continent)
[1] 4
levels(Drop_Oceania$continent)
[1] "Africa" "Americas" "Asia" "Europe"
Africa
Americas
Asia
Europe
We do note that the factor levels are ordered alphabetically. Let’s use forcats to re-order the factor levels. We can re-order in different ways.
One way to re-order is by frequency.For example, the frequency of continents in the data.
Drop_Oceania$continent %>%
fct_infreq() %>%
levels()
[1] "Africa" "Asia" "Europe" "Americas"
Africa
Asia
Europe
Americas
We can also re-order by the value of other variables in the data as life expectanc or gdp.
Drop_Oceania_1980 <- Drop_Oceania %>%
filter(year > 1979) %>%
group_by(continent, year) %>%
mutate(mediangdp = median(gdpPercap))
Drop_Oceania_1980
fct_reorder(Drop_Oceania_1980$continent, Drop_Oceania_1980$gdpPercap, min) %>%
levels() %>% head()
[1] "Africa" "Asia" "Americas" "Europe"
Africa
Asia
Americas
Europe
Now we can put this into graph form.
ggplot(Drop_Oceania_1980, aes(x = year, y = Drop_Oceania_1980$mediangdp, color = fct_reorder2(continent, year, Drop_Oceania_1980$mediangdp))) +
geom_line() +
labs(color = "Continent")+
xlab("Year") + ylab("Median GDP") +
ggtitle("Median GDP 1980-2007 by Continent")
We see that reordering the levels allows the legend to be organized in the same fashion as the trendlines.
Let’s try saving some wrangled data into a new file.
trial<- gapminder %>%
filter(year == "2007")
write.csv(trial, file = "assignment5_stat545")
Now let’s import this file to see if we can read it.
read.csv("assignment5_stat545")
Here is an old graph from assignment 3. It looks at the weighted mean life expectancy for continents over time for the gapminder data set.
gapminder %>%
group_by(continent, year) %>%
summarise(mean_lifeExp_weighted = weighted.mean(lifeExp, pop)) %>%
ggplot(aes(year, mean_lifeExp_weighted))+
geom_point(aes(colour = continent))
What can be improved? 1) We can add a trend line so we can estimate what is happening between years. 2) We can change the axis titles and add a graph title. 3) We can re-order the factor levels so the legend is presented in logical order. 4) We can change the scale of the axes to show every 5 years of life. 5) We can change the graphs theme.
gapminder %>%
group_by(continent, year) %>%
summarise(mean_lifeExp_weighted = weighted.mean(lifeExp, pop)) %>%
ggplot(aes(year, mean_lifeExp_weighted, color = fct_reorder2(continent, year, mean_lifeExp_weighted)))+
geom_point(aes(colour = continent))+
geom_line(aes(colour = continent))+
scale_y_continuous(breaks=5*(1:17))+
xlab("Year")+
ylab("Weighted Mean Life Expectancy")+
labs(color = "Continent")+
ggtitle("Weighted Mean Life Expectancy vs. Time (Years)")+
theme_minimal()
**Let’s try a different graph and also use plotly to make it more interactive.*Now we can make an interactive plot using plotly.**
Here is the original graph:
gapminder %>%
group_by(year) %>%
mutate(median = median(lifeExp)) %>%
ggplot(aes(year, lifeExp)) +
geom_jitter(aes(colour = (lifeExp < median)), alpha = 0.5)+
facet_wrap(~ continent)
Here is the cleaner version.
gapminder %>%
group_by(year) %>%
mutate(median = median(lifeExp)) %>%
ggplot(aes(year, lifeExp)) +
geom_jitter(aes(colour = (lifeExp < median)), alpha = 0.5)+
facet_wrap(~ continent,nrow = 5, ncol = 1)+
scale_y_continuous(breaks=10*(1:9))+ #change the scale
xlab("Year")+ #change the titles
ylab("Life Expectancy")+
ggtitle("Country life expectancy over time")+
scale_colour_discrete(name ="CountrylLife expectancy > worldwide median?")+ #make the legend clearer
theme(legend.position="right") #select legend position
Now, let’s make this into a plotly graph.
#First, let's save the above graph into "plotly1"
plotly1 <- gapminder %>%
group_by(year) %>%
mutate(median = median(lifeExp)) %>%
ggplot(aes(year, lifeExp)) +
geom_jitter(aes(colour = (lifeExp < median)), alpha = 0.5)+
facet_wrap(~ continent,nrow = 5, ncol = 1)+
scale_y_continuous(breaks=10*(1:9))+
xlab("Year")+
ylab("Life Expectancy")+
ggtitle("Country life expectancy over time")+
scale_colour_discrete(name ="CountrylLife expectancy > worldwide median?")+
theme(legend.position="right")
#Now open plotly package.
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
#GEnerating plotly graph.
ggplotly(plotly1)
**We do notice that plotly has some nice features. However, visualization is not as good. The legend is cut off. We can see if moving the legend elsewhere will improve this visualization.
plotly2 <- gapminder %>%
group_by(year) %>%
mutate(median = median(lifeExp)) %>%
ggplot(aes(year, lifeExp)) +
geom_jitter(aes(colour = (lifeExp < median)), alpha = 0.5)+
facet_wrap(~ continent,nrow = 5, ncol = 1)+
scale_y_continuous(breaks=10*(1:9))+
xlab("Year")+
ylab("Life Expectancy")+
ggtitle("Country life expectancy over time")+
scale_colour_discrete(name ="CountrylLife expectancy > worldwide median?")+
theme(legend.position="bottom")
#GEnerating plotly graph.
ggplotly(plotly2)
Hmmmm. It appears plotly dose not have all the aesthetic options that ggplot does. Anyways, let’s save this as a local html file.
ggplotly(plotly1) %>%
htmlwidgets::saveWidget("plotly1")
Let’s save this plot using ggsave(). This will generate an image file where we can specify certain aspects such as dimension and resolution.
plotly1
ggsave("plotly1.png", width = 20, height = 20, units = "cm", dpi = 300)